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ABSTRACT 



A system and method for seamlessly combining client-only 
rendering techniques with server-only rendering techniques. 
The approach uses a composite stream containing three 
distinct streams. Two of the streams are synchronized and 
transmit camera definition, video of server-rendered objects, 
and a time dependent depth map for the server-rendered 
object. The third stream is available to send geometry from 
the server to the client, for local rendering if appropriate. 
The invention can satisfy a number of viewing applications. 
For example, initially the most relevant geometry can stream 
to the client for high quality local rendering while the server 
delivers renderings of less relevant geometry at lower reso- 
lutions. After the most relevant geometry has been delivered 
to the client, the less important geometry can be optionally 
streamed to the client to increase the fidelity of the entire 
scene. In the limit, all of the geometry is transferred to the 
client and the situation corresponds to client-only rendering 
system where local graphics hardware is used to improve 
fidehty and reduce bandwidth. Alternatively, if a client does 
not have local three-dimensional graphics capability then the 
server can transmit only the video of the server-rendered 
object and drop the other two streams. In either case, the 
approach also permits for a progressive improvement in the 
server-rendered image whenever the scene becomes static. 
Bandwidth that was previously used to represent changing 
images is allocated to improving the fidelity of the server- 
rendered image whenever the scene becomes static. 

38 Claims, 13 Drawing Sheets 
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METHODS AND APPARATUS FOR 
DELIVERING 3D GRAPHICS IN A 
NETWORKED ENVIRONMENT 

CROSS-REFERENCE TO RELAl ED 
APPLICATION 

1lie subject matter of this application is related to the 
disclosure of co-pending U.S. patent application Ser. No. 
09/411312 filed Oct. 4, 1999, by Paul Borrel, Shawn Hall, 
William P. Horn, James T. Klosowski, William L. Luken, 
loana M. Martin, and Frank Suits for "Methods and Appa- 
ratus for Delivering 3D Graphics in a Networked Environ- 
ment Using Transparent Video" and assigned to a common 
assignee herewith. The disclosure of co -pending U.S. patent 
application Ser. No. 09/411,312 is incorporated herein by 
reference. 

DESCRIPnON 
BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention generally relates to graphics pro- 
cessing and display systems and, more particularly, to the 
creation and presentation of three-dimensional scenes of 
syn±etic content stored on distributed network sources and 
accessed by computer network transmission. The invention 
further relates to methods of adaptively selecting an optimal 
delivery strategy for each of the clients based on available 
resources. 

2. Background Description 

Using three-dimensional graphics over networks has 
become an increasingly effective way to share information, 
visualize data, design components, and advertise products. 
As the number of computers in the consumer and commer- 
cial sectors with network access increases, the number of 
users accessing some form of three-dimensional graphics is 
expected to increase accordingly. For example, it has been 
estimated by W. Meloni in "The Web Looks Toward 3D", 
Computer Graphics World, 21(12), December 1998, pp. 20 
et seq., that by the end of year 2001, 152.1 million personal 
computers (PCs) worldwide will have an Internet connec- 
tion. Out of this number, approximately 52.3 million users 
will frequently access three-dimensional images while on 
the World Wide Web (WWW or the Web). This number 
compares to only 10 million users accessing three- 
dimensional Web images in 1997 out of a total of 79 million 
Internet users. However, the use of three-dimensional graph- 
ics over networks is not limited to consumer applications. In 
1997, roughly 59% of all U.S. companies had intranet 
connections. By 2001 this figure is expected to jump to 80%. 
This transition includes three-dimensional collaboration 
tools for design and visualization. For instance, within the 
computer-aided design (CAD) community there is signifi- 
cant interest in applications which permit sharing on a global 
basis of three-dimensional models among designers, 
engineers, suppliers and other interested parties across a 
network. The capability to perform "visual collaborations" 
offers the promise to reduce costs and to shorten develop- 
ment times. Other corporate interests target the use of 
three-dimensional solutions to visualize data such as finan- 
cial fluctuations, client accounts, and resource allocations. 

As generally shown in FIG. 1, three-dimensional models 
and their representations are typically stored on centraUzed 
servers 100 and are accessed by clients 101 over commu- 
nication networks 102. Several data-transfer technologies 
have been developed over the past few years to visualize 
three-dimensional models over networks. 



'7,257 Bl 

2 

At one end of the spectmm are the so-called cfient-side 
rendering methods in which the model is downloaded to the 
chent which is entirely responsible for its rendering. FIG. 2 
shows a diagram of a typical client-side rendering architec- 

5 ture. Upon input from a user or another appUcation 201, the 
client 202 requests, via network 203 as client feedback 204, 
a model from the server 205. The geometry server 210 
within server 205 contains the 3d geometry 211 and the 
scene parameters 212. In response to client feedback 204, 

10 the server 205 retrieves the model from storage 206 and 
delivers the 3d geometry 213 to the client 202 over the 
network 203. Once the model has been received by the 
client, the chent 3d browser 208 renders it in client rendering 
engine 207 and displays it on the display 209. Additional 

15 client feedback may follow as the user interacts with the 
model displayed and more information about the model is 
downloaded. Such methods typically reqmre a considerable 
amount of time to download and display on the client an 
initial meaningful representation of a complex three - 

20 dimensional model These methods also require the exist- 
ence of three-dimensional graphics capabilities on the chent 
machines. 

Alternatives to en masse downloading of a model without 
prior processing include storage and transmission of com- 

25 pressed models, as reported by G. Taubin and J. Rossignac 
in "Geometry Compression Through Topological Surgery", 
ACM Transactions on Graphics, April 1998, pp. 84-115, 
streaming and progressive delivery of the component 
geometry, as reported by G. Taubin et al. in "Progressive 

30 Forest Split Compression", ACM Proc. Siggraph *98, July 
1998, pp. 123-132, H. Hoppe in "Progressive Meshes", 
ACMProc, Siggraph '98, August 1996, pp. 99-108, and M. 
Garland and P Heckbert in "Surface Simplification Using 
Quadric Error Bounds", ACM Proc. Siggraph'97, August 

35 1 997, pp. 209-216, and ordering based on visibihty, as 
reported by D. Aliaga in "Visuahzation of Complex Models 
Using Dynamic Texture -Based Simplification", Proc. IEEE 
Visualization *96, October 1996, pp. 101-106, all of which 
are targeted towards minimizing the delay before the cHent 

40 is able to generate an initial display. However, producing 
such representations may involve significant server comput- 
ing and storage resources, the downloading time remains 
large for complex models, and additional time may be 
necessary on the client to process the data received (e.g., 

45 decompression). For example. Adaptive Media's Envision 
3D (see www.envisionxom) combines computer graphics 
visibility techniques (e.g., occlusion culling as described by 
H. Zang et al., "Visibihty Culling Using Hierarchical Occlu- 
sion Maps", ACM Proc. Siggraph '97, August 1997, pp. 

50 77-88) with streaming to guide the downloading process by 
sending to the clients the visible geometry first and display- 
ing it as it is received, rather than waiting for the entire 
model to be sent. Nonetheless, determining which geometry 
is visible from a given viewpoint is not a trivial computation 

55 and maintaining acceptable performance remains a chal- 
lenging proposition even when only visible geometry is 
transmitted. 

At the opposite end of the spectrum arc server-side 
rendering methods, as generally shown in FIG. 3, which 

60 place the burden of rendering a model entirely on the server 
and the images generated are subsequently transmitted to 
chents. As in the case of cfient-side methods, the cUent 301 
usually initiates a request for a model. However, instead of 
downloading the three-dimensional model to the client 301, 

65 the model and scene description 302 stored in storage 303 is 
rendered on the server 304 in rendering engine 305 to 
produce two-dimensional static images 306, and one or 
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more two-dimensional images 307 resulting from this ren- system environment conditions such as server load, client 

dering are transmitted over the network 308 to the client capabilities, available network bandwidth, and user con- 

301, Subsequently, the images 307 are displayed on display straints. In addition, the lack of standards and the increasing 

309 of the client 301. The cycle is then repeated based on complexity of the models have contributed to limiting the 

user feedback 310. 5 success of existing technologies. 

Such techniques have the advantages that they do not 

require any three-dimensional graphics capabilities on the SUMMARY OF THE INVENTION 

part of the clients and the bandwidth requirements are • *u r u- . r^u . j 

, *i J J -n. * J re • .I.- 41. 1 It IS therefore an object oflhe present mvention to provide 

sigmficantly reduced. The tradeoffs m this case are the loss , j j i.- i_ -j i 

c ^ ^ ' ^ J w • . a system and method which provides a continuous, seamless 
01 real-time interaction with the model (i.e., images cannot m . pa- .- u. i i 
. . i. . , |. ^ , . ^ ^. , spectrum ot rendering options between server-only render- 
be delivered to clients at interactive frame rates) and the • j i- . i j - 

, , . , \. ing and client-only rendering, 

mcrease in server load and hence, server response times, as j a 

the number of cHents concurrently accessing the server Another object of the invention is to provide a user- 
increases. An example of a server-side-based rendering controlled tradeoff between the quality (fidelity) of the 
system is CATWeb (www.catia.ibm.com) which is a web 15 ^ndered image and the frame rates at which the rendered 
browser-based application designed to provide dynamic ^ displayed on the client. 

CAD data access to users with intranet connections and It is yet another object of the invention to provide a 

graphics capabilities. Another example in this category is system and method which provides rendering options that 

panoramic rendering described by W. Luken et al, in "Pan- adaptively track a dynamic network environment. 

oramIX: Photorealistic Multimedia 3D Scenery", IBM 20 Yet another object of this invention is to provide a system 

Research Report #RC21145, IBM T. J. Watson Research and method that uses dead reckoning techniques to avoid 

Center, 1998. A panorama is a 360 degree image of a scene latency problems in a network. 

around a particular viewpoint. Several panoramas can be According to the invention, there is provided a novel 
created for different viewpomts in the scene and connected approach to the problem of seamlessly combining client- 
to support Imiited viewpomt selection. ^5 only rendering techniques with server-only rendering tech- 
Hybrid rendering methods described by D. Aliaga and A. niques. The approach uses a composite stream containing 
Lastra in "Architectural Walkthroughs Using Portal three distinct streams. Two of the streams are synchronized 
Textures", Proc. IEEE Visualization '97, October 1997, pp. and transmit camera definition, video of server-rendered 
355-362, M. Levoy in "Polygon-Assisted JPEG and MPEG objects, and a time dependent depth map for the server- 
Compression of Synthetic Images", ACM Proc. Siggraph 30 rendered object. The third stream is available to send geom- 
*95, August 1995, pp. 21-28, and Y.Mann and D.Cohen-Or etry from the server to the client, for local rendering if 
in "Selective Pixel Transmission for Navigating in Remote appropriate 

Virtual Environments^ Pj«c. £«mgrap/«/cs '97, 16 (3), ^h^ ^^^^ ^ ^^^^^^ „f j^^^. 
September lyyv, pp. 201-206, provide a compromise (ions. For example, initially the most relevant geometry can 
approach by rendermg part of a complex model on the server 35 ^t^eam to the client for high quality local rendering whUe the 
(usually components that are far away from the viewer or of ^^^^ d^ii^^rs renderings of less relevant geometry al lower 
secondary interest; and part on the chent. Thus, a combi- resolutions. After the most relevant eeometrv has been 
nation of images (possibly augmented with depth delivered to the client, the less important geometry can be 
mtormation) and geometry is delivered to the client. For optionally streamed to the client to increase the fidelity of 
example, the background of a three-dimensional scene may 40 ,ije entire scene. In the limit, aU of the geometry is trans- 
be rendered on the server as a panorama with depth mfor- t„ cUent and the simation corresponds to client- 
mation at each pixel. Foreground objects are delivered as „^ r^aA^rm^ system where local graphics hardware is used 
geometry to the client and correctly embedded into the ^ -^^ gd^^y and reduce bandwidth. Alternatively, if a 
panorama usmg the depth uiformation. The mam advantage ^^^^j ^3^^ j^^j three-dimensional graphics capa- 
of such an approach is that the tmie to transmit and display 45 bility then the server can transmit only the video of the 
on the client the server-rendered parts of the model ,s server-rendered object and drop the other two streams. In 
independen of the scene complexity, while the frame rate ^jt^er case, as an additional feature, the approach permits for 
and the interaction with the chent-rendered parts are , progressive improvement in the server-rendered image 
improved^Additional processing of the image and geometry ^^enever the scene becomes static. Bandwidth that wis 
data may be done to opumize their transfer over the network. 50 previously used to represent changing images is allocated to 
For instance, in IVl Uvoy, supra, image compression is in,proving the fidelity of the server-rendered image when- 
apphcd to the two-dimensional data and model simphfica- j^e scene becomes static, 
tion and compression are performed on the three- 
dimensional data before they are sent to the client. Some of BRIEF DESCRIPTION OF THE DRAWINGS 
the disadvantages of hybrid rendering methods are the fact 55 Tn. ^ j .i_ . j 
that determining whether a part of a given model should be ^j^l ^omg and other objects, aspects and advantages 
rendered on the server or on the chent is usually not a trivial ^ !'^ ^''f understood from the following detailed 
task, extra image information is often required to fill in description of a preferred embodiment of the mvention with 

1- ,u i 1? r ' • . reference to the drawmgs, m which: 

occlusion errors that may occur as a result of a viewpomt ^ 

change on the client, and limited user interaction. 60 ^ ^ diagram showing a prior art client-server 

Although the subject has been addressed by B. O. architecture; 

Schneider and I. Martin in "An Adaptive Framework for 3D ^ ^ ^ ^jlock diagram showing prior art of client-side 

Graphics in Networked and Mobile Environments", Proc. rendering; 

Workshop on Interactive Applications of Mobile Computing FIG. 3 is a block diagram showing prior art of server-side 

(1MC*98), November 1998, in general, commercial methods 65 rendering; 

for delivering three-dimensional data over networks are not FIG. 4 is a block diagram showing an overview of a 

adaptive. They do not take into account dynamic changes in typical networking environment using the present invention; 
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FIG. 5 is a block diagram showing the descriptor genera- or the server- rendered pixeL In another method, the z-buffer 

tion component of the invention; information is transmitted in compressed form. 

BG. 6 is a block diagram showing the client feedback The present invention is particularly useful in applications 

components in the invention; involving a large, centrally-located CAD database with 

u ■ . 5 many chent computers of varymg graphics capabilities 

FIG. 7 IS a block diagram showing the server components accessing one or several models over computer networks of 

responsible for processing the client feedback; variable bandwidths. The invention can also be used, 

FIGS. 8A is a diagram illustrating prior art client render- however, to satisfy a number of viewing applications. For 

ing bandwidth requirements; example, initially the most relevant geometry can be 

HG. 8B is a diagram illustrating prior art server rendering lo ^^^^^"^^^ ^° ^^^^^^ quality local rendering, while 

bandwidth requirements server delivers renderings of less relevant geometry at 

^„ „^ . .„ . lower resolutions. After the most relevant geometry has been 

HG. 9A is a diagram lUustratmg server-side bandwidth delivered to the client, the less important geometry can be 

requirements for the present invention; optionally streamed to the cUent to increase the fideUty of 

FIG. 9B is a diagram illustrating the mixed client -side and the entire scene. In the limit, all of the geometry is trans- 
server-side bandwidth requirements for the present inven- fen-ed to the client and this situation corresponds to client- 
tion; only rendering systems where local graphics hardware is 

FIG. 9C is a diagram illustrating client-side bandwidth ^^ed to improve fidelity and reduce bandwidth, 

requirements for the present invention; Alternatively, if a client does not have local three- 

^„ . , , , J. . „ . dimensional graphics capability, the server can transmit only 

FIG. 10 IS a block diagram defimng H.323 extensions; ^-^^^ ^^^^ server-rendered objects and drop the other 

FIG. 11 is flow chart of the dead reckoning process; two streams. In either case, as an additional feamre, the 

FIG. 12 is a flow chart of the "zideo" server; and approach permits for a progressive improvement in the 

HG. 13 is a flow chart of the "zideo" client. server-rendered image whenever the camera is no longer 

being manipulated by the client, and the scene becomes 

DETAILED DESCRIPTION OF PREFERRED 25 static. Bandwidth that was previously used to represent 

EMBODIMENl^S OF THE INVERIION changing images is allocated to improving the fidelity of the 

™. .... . L- u J server-rendered image whenever the scene becomes static. 

This invention is a system which provides a continuous, ^ „ « ki««S « ..^^r^ « 

, r J • • L 1 FIG. 4 IS a block diagram showing an overview of a 

seamless spectrum 01 rendermg options between server-only * • i * i- • * • *l 

. ■ J, , . J • . J ^- 1 typical networking environment using the present invention, 

rendenng and chent<,nly rendering. TTie system adaphvely ^ ^^^^^ ^ ^ 400 comprising a server 

chooses a particular rendering option to accommodate sys- 7 ^ ^ , ^ . i. ^ 

- , * ^ computer 401, a computer network 402, and a cbent com- 

tem factors such as: ^ r™. am c 1 a^s 

puter 403. The server 401 further comprises a disk 405 

available network bandwidth, ^tiere one or a pluraUty of geometric models are stored, and 

client three-dimensional graphics capabilities, central a descriptor generating system 406. The descriptor generat- 

processing unit (CPU) capabilities, and CPU load; 35 ing system 406 contains a rendering system 407 and a 

server three-dimensional graphics capabiliti^, CPU multiplexer 408. The rendering system 407 contains a three - 

capabilities, and CPU load; dimensional facility 409 for processing scenes of three - 

display image size* dimensional geometric models, and feeds systems 410, 411 

eye position used for rendering; f that support three different output iyp^s. ITie 

, . , .0 J 40 zideo system 410 generates image and related z-buuer 

scene complexily (for example number of connected j^fo^^^ii^ ^^^^^^ 3, ^^eo, which may be com- 

components, number ot triangles, and so lorth); ^^^^ ^ideo information consists of video and z-buffer 

depth complexity; information. The three-dimensional system 4U generates 

division of geometry between the foreground and the streamed three-dimensional geometry. The camera system 

background; and 45 412 maintains the parameters describing the camera. The 

the number of pixels per triangle. server 401, and in particular the descriptor generating sys- 

The present invention is a system for generating and deliv- tem 406, are described in greater detail in FIG. 5. 

ering rendered images of synthetic content, consisting of one The network 402 in this environment is responsible for 

or a plurality of three-dimensional geometric models, across passing descriptors 413 from the server computer 401 to the 

a computer network. The system uses a server computer and 50 client computer 403, as well as passing feedback 414 from 

a client computer and permits the rendering of one or several the client computer 403 back to the server 401. Descriptors 

geometric models on the server computer, on the client 413 is a term used to describe what is being sent from the 

computer, or a combination of the two, for the purposes of server to the client as well as the actual data that is being 

visualizing and interacting with the three-dimensional geo- transmitted. For example, the descriptors 413 can indicate 

metric models on the client. 55 that the server is sending only images, in the case of 

The approach utilizes a composite stream containing three server-only rendering; only geometry, in the case of client- 
distinct streams. T\vo of the streams are synchronized and only rendering; or images, z-buffer information, and camera 
arc used for transmitting camera parameters, video of the parameters, in the case of server and client rendering. The 
server-rendered objects, and a time -dependent depth map for feedback 414 information that is being sent from the client 
the server-rendered objects. The third stream is used to send 60 403 to the server 401 is a means for the client 403 to specify 
geometry from the server to the client, for local rendering. what it would like the server 401 to do. For example, the 

Several novel features of the present invention are the client 403 could indicate specific components of the geo- 

methods used by the client to perform the compositing metric models in disk 405 that it would like the server 403 

operation. In one method, the z-buffer, or depth map, infor- to send to it for local rendering, or it could tell the server 401 

malion generated by the server is compared to the z-buffer 65 to send higher, or lower, quality images, llie feedback 414 

information generated by the client to decide, for each pixel mechanism used by the present invention is described in 

in the final image, whether to use the client-rendered pixel greater detail in FIGS. 6 and 7. 
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A descriptor realization system 415 resides on the client 
computer 403, where the descriptors 413, sent via the 
network 402, are utilized to visualize the synthetic content. 
The descriptor realization system 415 consists of a demul- 
tiplexer 416, which splits the incoming stream of data into 
separate streams, and forwards the streams to either the 
rendering system 417, the zideo decoder 418, or to the local 
camera 431 within the user interface 430. 

If geometric models are being sent to the client 403, the 



8 



models, and a descriptor generating system 406, for gener- 
ating synthetic content to be sent across a computer network 
402. The descriptor generating system 406 is further broken 
down into a rendering system 407 and a multiplexer 408, 
which is used for combining the zideo 410, s3d 411, and 
camera outputs 412 produced by the rendering system 407. 

The rendering system 407 contains a three-dimensional 
facility 409 for processing scenes of three-dimensional 
geometric models. The three-dimensional facility 409 man- 



streamed three-dimens onal geometry 411 and the camera lO ages the data that is being visualized, by loading it into the 

parameters 412, ate «nt to the client's rendering system main memory of the computer and by handling requests 

417. The geonietry is Jien tendered on the client 403 using from clients who may wish to make modifications, e.g., 

the camera 420 and „: framebufifer is read to compute the transformations, to the scene of geometric models, lie 

If RGB (red, iireen. blue) color image values 421 three-dimensional facility 409 also passes the geometric data 

and the z-buffer mformition 422. The outputs are then sent is to the "zideo" system 410 and the three-dimensional system 

to the compositor 419. 4U .j-^ivm 

If zideo 410 has been snt to the client 403, it is forwarded Using the camera parameter 412 of the server 401 the 

by the sphtter 416 to tic decoder 418. The decoder 418 renderer 500 of zideo system 410 renders geometric models 

separates the RGB imaje values 423 from the z-bufifer passed to it by the three-dimensional facility 409 The 

mformation 424. and pases the output to the compositor 20 rendered images 501 may then be sent to the computer 

419. In the case of ser;.r-only rendering, the zideo 410 display 432 on the client 403, although this is not required 

would not contain any z-kuffer mformation and the video After the geometry has been rendered, the framebuffer is 

sent from the server w,mld be sent immediately to the read and the RGB image 501 and the z-bufifer, or depth, 

compositor 419. information 503 is passed to the zideo system's compress 

If camera parameters 4i tare sent to the client, the splitter 25 and stamp subsystem 504. The compress and stamp sub - 

416 also forwards these pinmeters to the user interface 430. system 504 is responsible for timestamping the information 

The compositor 419 accepts as input the image 421 and that is being passed from the renderer 500 and eventually to 

z-bufiFer infotroation 422from the client rendering system the multiplexer 408. The timestamping is required to enable 

417, image 423 and the ^buffer information 424 from the the client 403 to synchronize the data that is being received 

server. It is not necessanV the case that aU of these input 30 over the network 402. The image 501 and z-buffer informa- 

values are actaally presc t all of the time. In server-only tion 503 can also be compressed to reduce the bandwidth 

rendenng, the composito^l9 would only accept the image required across the network 402. After timestamping and 

423 from the decoder 4f For cliem-only rendering, the compression are done, the output of the zideo system, called 

compositor 419 would o^y need to accept the image 421 "zideo" out 505, is passed to the multiplexer 408 The rate 

from the chent rendena system 417. In these extreme 35 506 functionahty is provided as a means for the compress 

cases, the compositor 41<Sias Uttle to do other than to pass and stamp subsystem 504 to pass feedback to the tenderer 

the final image 435 alon^o the display 432 for the user to 500, for instance, if the images 501 are being passed too 

see. It IS only when the sylhetic content is a combination of quickly for the compressor 504 to keep up 

server and client renderjg that the compositor 419 has The three-dimensional system 411 generates streamed 

actual work to do. In thiskse, the compositor 419 needs to 40 three-dimensional geometry. Initially the geometry is passed 

determine, for each pixejin the final image that will be to the three-dimensional system 411 from the three- 

disp ayed for the user, viether to use the corresponding dimensional facility 409. The geometry is then partitioned 

pixel generated on the seifer 401 or on the client 403. This 507 into smaller pieces of data which are then ordered 508 

decision is based upon sejral factors, including the z-buffer according to a priority scheme, which may or may not be 

information 426 and 421 and the relationship between 4S influenced by the client 403. Once the pieces of data have 

camera parameters on th^rver 412 and the cUent 420. been partitioned 507 and ordered 508, they may be com- 

If the camera paramet.k 412 on the server 401 and the pressed 509 and sent as three-dimensional out 510 to the 

chent 403 are withm a i^ecified tolerance level, then the muhiplexer 408. 



z-buffer information 426 
determine whether to us 



ind 428 wQl typically be used to The camera out system 511 passes the parameters describ- 
the server 401 or the client 403 50 ing the server camera, in block 412, to the multiplexer 408 



rendered pixel. However f there is a significant difference 
in the camera parameter the system can choose to ignore 
the server-rendered ima;s, and only display the client- 
rendered images to prevSt the user from becoming disori- 
ented. 

The output of the c mpositor 419 is an image 435 
presented to the user on e computer display 432. The user 
interface 430 is a mcchasm for the user to send feedback 
414 to the server. For exiiple, if the user wishes to visualize 



to be sent to the client 403. The camera 412 is required by 
the Tenderer 500 and may optionally be modified on the 
server 401, although typically this is not the case. 

A quality of service, or QOS, system 512 is part of the 
55 descriptor generating system 406 also. The QOS system 512 
interprets some of the feedback 414 sent from the client 403 
to the server 401. The QOS system 512 can influence the 
rendering system 407, by going through the compress and 
stamp subsystem 504 and the rate function mechanism 506, 



,u . • J 1 f , ^"^^j^ai'-iiJ ju't aiiu ine raic lunciion mecnanism 5U6. 

the geometric models fh, a different viewpoint, updated 60 and also the three-dimensional system 411. For example 

camera mrameters ranSe cp.nt ht»n\r tr. iU^ c^„,^^ am ...t. . ^ . ^^Aoiupn,, 



camera parameters can^e sent back to the server 401. 
Additional information di also be passed back to the server 
401 through this interfaclFeedback 414 sent from the client 
403 to the server 401 isfrther discussed in FIGS. 6 and 7. 
FIG. 5 is a block dia^m showing the descriptor genera 



tion component 406 of t 
server 401 is comprised 



cutrent invention. Recall that the 
a disk 405 used to store geometric 



when sending images across a network, there is typically a 
tradeoff between fidelity and frame rate. In other words, high 
quality images require more time to produce and therefore 
the number of images sent in a fixed amount of time, also 
65 called the frame rate, decreases. Similarly, low quality 
images can be produced much faster and therefore the client 
receives images at a much higher frame rate. Thus, one form 
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of feedback from the client would be to indicate the desired 
quality of the images it wishes to receive, or the frame rate 
at which it would like to receive the images. 

FIG. 6 is a block diagram showing the client feedback 
components in the current invention. The feedback direction 
600 indicates that the feedback 414 is from the client 403, 
in particular the descriptor realization system 415, to the 
server 401. Within the descriptor realization system 415, 
there are three systems that can provide feedback to the 
server: the compositor 419, the demultiplexer 416, and the 
user interface mechanism 430. The compositor 419 can 
effect the quality 602 of the descriptors 413 that are being 
sent to the client 403. For example, the compositor 419 
knows at what frame rale 506 the images 435 are being 
displayed for the user, and therefore the compositor 419 can 
inform the server 401 that it should send images 501 faster 
if it is not keeping up with the current frame rate. The 
demultiplexer or splitter 416 sends feedback to the server 
401 in the form of error correction 603. This particular 
feedback mechanism is prior art and involves the reliable 
delivery of content from the server 401 to the client 403. The 
reliable delivery can be accomplished, for example, by using 
TCP (Transmission Control Protocol) or using reliable UDP 
(User Datagram Protocol). The user input mechanism 430 
also affects the quality 602 of the descriptors 413 sent to the 
chenl 403, as well as traditional user feedback 601 in which 
the camera position is modified by the client 403. There are 
additional scenarios in which user feedback 601 is sent to the 
server 401, and these are discussed in FIG. 7. The quality 
feedback 602 can also allow the user to specify to the server 
401 whether to send better quality images or additional 
geometry to be rendered locally. 

FIG. 7 is a block diagram showing the server components 
responsible for processing the client feedback 414. The 
direction of feedback 700 continues to point from the client 
403 to the server 401. As indicated originally in FIG. 6, the 
three categories of client feedback are error correction 603, 
user feedback 601, and quality 602. The error correction 
feedback 603, involving prior art reliable delivery 
requirements, is handled by the multiplexer 408. User feed- 
back 601 is passed back to a multitude of systems, described 
as follows. The user can indicate a change in the geometric 
model scene, for example by transforming the location of a 
particular model. Such a request is handled by the three- 
dimensional facility 409. The user can modify the camera 
parameters which is processed by the camera out system 
511. A request to change the size or resolution of the image 
would be processed directly by the renderer 500. The final 
type of user feedback 601 consists of requests for specific 
components of the geometric models to be sent from the 
server 401 to the client 403, if, for instance, the client 403 
wishes to inspect a particular part of a larger assembly. Such 
requests are handled by the three-dimensional system 411. 
Quality is handled by the quality of service (QOS) mecha- 
nism 512. The QOS mechanism 512 communicates with the 
compress and stamp subsystem 504 and the three- 
dimensional system 411. 

FIG. 8A is a diagram illustrating prior art bandwidth 
requirements for client rendering. The Virtual Reality Mod- 
eling Language, or VRML, approach involves client-only 
rendering. Three elements are traditionally sent across the 
network in the prior art: geometry 211, image 306, and 
feedback 204. The geometric models 211 are sent across the 
network 203 and the client 202 must wait until all informa- 
tion has been received, unless clever progressive transmis- 
sion strategies have been used. Once the geometry is located 
locally and is being rendered on the client 202, only occa- 
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sional feedback 204 to the server is necessary. In the other 
extreme, shown in FIG. 8B, that of server-only rendering, 
the CATWeb approach sends images 306 to the client 301 
occasionally, only after receiving feedback 310 from the 
5 client 301 to indicate, for example, a change in camera 
parameters, or a request to visualize a different geometric 
model. 

FIG. 9A is a diagram illustrating bandwidth requirements 
for the present invention. Three elements are sent across the 

10 network in the present invention: streamed geometry 411, 
zideo 410, and feedback 414. As shown in FIG. 9 A, the 
server-only rendering approach within the present invention 
is identical to that of the CATWeb approach of FIG. 8B. 
Images 501 are sent to the client 403 occasionally, only after 

15 receiving feedback 414 from the client. The client-only 
rendering, shown in FIG. 9C, in the present invention is 
different than the prior art described in FIGS. 2 and 8A. In 
this case, a combination of zideo 410 and streamed geometry 
411 is sent to the client 403 so that some visualization can 

20 occur immediately. Once all of the streamed geometry 411 
has been obtained by the client 403, no further information 
is needed from the server 401. In between the two extremes, 
the server and client renderings can be mixed, as shown in 
FIG. 9B. Images 501 and depth information portion of zideo 

25 503 are initially sent with streamed geometry 411 untU all of 
the desired geometry has been loaded on the client 403. 
Then, only zideo 410 is sent to augment the client-side 
rendering, as determined by the feedback 414 sent to the 
server 401. 

30 FIG. 10 is a block diagram which highlights a possible 
extension to the H.323 standard. The International Telecom- 
munications Union (ITU) is an organization that sets stan- 
dards for multimedia communications, H.323 is a well- 
established standard within the community of audio, video, 

35 and data communications across networks such as the Inter- 
net. The shaded region in FIG. 10 shows a possible exten- 
sion to the H.323 standard, whereby using the present 
invention, sending synthetic content, such as zideo and 
geometry, could also be included in the standard. 

40 FIG. 11 is a flow chart of the dead reckoning process 
based on the presence of clocks on the server and client. 
Initial synchronization occurs when streaming begins at the 
server and the server clock is reset to zero 1102 prior to 
content creation, compression, and transmission. The client 

45 clock is reset 1113 after fully receiving and decompressing 
the first frame. The client and server clocks are therefore not 
synchronized in real time, but content created for display at 
time, T, and time stamped accordingly will automatically be 
available at time T of the client's clock after transmission 

50 and decompression. An error signal can thereafter be fed 
back from the client to the server indicating the error in the 
arrival time of a frame and its time stamp, allowing dynamic 
modifications to the server clock to keep iLs delivery of 
media in synch with the client. 

55 When interaction occurs on the cHent side, the gesture and 
its client time stamp are sent to the server and used in a 
predictor algorithm to begin tracking the motion requested. 
Network delays on both trips between client and server will 
be accommodated and the media streamed from the server to 

60 the client, after a brief delay, will by in synch with the 
requested motion. 

More specifically, steps 1101-1108 illustrate the steps 
done on the server side. The process starts 1101 after 
initiation by the server or upon request from the client. In 

65 function block 1102, the server timer is reset. In function 
block 1103, the time is calculated for the next object using 
known delay. Initially this is approximate, but once feedback 
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1118 begins arriving from the client this value will be 
refined. Then, in function block 1104, parameters are cal- 
culated for the next object based on its anticipated presen- 
tation time. This includes the time it takes the server to 
create the object and the time it takes to deliver the object to 5 
the client. In function block 1105, the object is created using 
the parameters calculated in 1104. In function block 1106, 
the object is stamped with its time and other parameters. In 
function block 1107, any post-processing of the object, such 
as compression, is done. Then, in step 1108, the object is lO 
delivered to the client over network. 

Steps 1109-1117 show the steps on the client side. In step 
1109, the client receives object from the network, and 
pre-processes the object in function block 1110. In function 
block nil, the client extracts time and other properties is 
associated with object. In decision block 1112, a determi- 
nation is made whether the object received is the first object. 
If the object received is the first object, then the client timer 
is reset in function block 1113. The server resets its timer 
before creating the first object, and the client reset its timer 20 
on receipt of the first object. If the object is not the first 
object, then in step 1118, the difference between the presen- 
tation time stamped on the object and the actual local time 
the object was ready for presentation is fed back to the server 
over the network. ITien, in function block 1114, local content 25 
is created with the same parameters, which is to be embed- 
ded in the server content. In function block 1115, the local 
content is merged with the remote content. In step 1116, the 
client waits until the intended presentation time. Then, in 
step 1117, the scene containing merged content from the 30 
client and the server is displayed. 

FIG. 12, shows a flow chart of the Zideo Server process. 
The process starts in step 1201, and is initiated either by the 
server or upon request firom the chent. In funcfion block 
1202, the scene is loaded from models in storage 1213. In 35 
step 1203, the scene is divided into two regions, one of 
which will be sent as geometry to the client 1208, and the 
other will be remain on the server and be sent as RGB image 
plus depth 1204. In decision block 1211, a determination is 
made whether there is a camera available from the client 40 
1210. If yes, the current camera from the client 1210 is used. 
Otherwise, a default initial camera 1212 is used. In function 
block 1204, using the camera from 1210 or 1212, and region 
11204, an RGB and depth image of the scene is created. In 
function block 1205, the frame is marked with descriptor 45 
information such as the camera used, time, and frame 
number. In function block 1206, RGB and depth are com- 
pressed and merged into single zideo image. Then, in step 
1207, zideo frames of region 1 are streamed to the client. In 
function block 1208, the geometry for region 2 is 50 
compressed, and streamed to the client 1209 until it has all 
been sent. 

FIG. 13 shows a fiow chart of the process of the Zideo 
Client. In step 1301, the client receives zideo stream from 
the server. In function block 1302, the compressed zideo 55 
frames are extracted as the stream arrives. In fimction block 
1303, the RGB and depth information are decompressed, 
and the depth 1304 and RGB image 1312 arc extracted. In 
function block 1311, descriptors from the zideo frame, e.g. 
camera parameters, are extracted. In function block 1309, 60 
the compressed geometry stream 1308 from the server is 
decompressed. In function block 1310 an RGB image is 
created of the compressed geometry stream 1308. The 
current camera is utilized if there is one; otherwise the 
camera used by the server to create the zideo is used. In step 65 
1305, the RGB zideo frame is overlaid on top of the 
geometry image created by 1310 using depth. This can be 



done by explicitly comparing the depth values of the two 
images and using whichever pixel is closer, or by directly 
rendering the compressed geometry into the RGB+depth 
frame. In function block 1306, user interaction with the 
scene makes the camera parameters change due to rotation, 
zooming, etc. In step 1307, the new camera parameters are 
sent back to the server for use in the following zideo frame 
renderings, and fed back to 1311 so that the new local 
camera is used to render the compressed geometry stream in 
1310. 

While the invention has been described in terms of 
preferred embodiments, those skilled in the art will recog- 
nize that the invention can be practiced with modification 
within the spirit and scope of the appended claims. 

Having thus described our invention, what we claim as 
new and desire to secure by Letters Patent is as follows: 

1. A computer imaging system comprising: 

a central processing unit (CPU), at least one memory, and 
a network interface to one or more networks; 

at least one scene model stored in said at least one 
memory, each said at least one scene model having at 
least one first part and at least one second part, each 
said at least one first part having a first three- 
dimensional geometric model and each said at least one 
second part having a second three-dimensional geo- 
metric model; 

means for converting the first three-dimensional geomet- 
ric model into a first two-dimensional image with depth 
information; 

means for providing the position of the first two- 
dimensional image with respect to the second three- 
dimensional geometric model; and 

means for transmitting the second three-dimensional 
model, the first two-dimensional image, the depth 
information, and the position of the first two- 
dimensional image with respect to the second three- 
dimensional geometric model through the network 
interfaces to the network. 

2. llie computer imaging system according to claim 1, 
wherein said converting means and said position means 
reside on a server. 

3. The computer imaging system according to claim 2, 
wherein said position means provides at least one of the 
following position parameters: a viewpoint, an orientation, a 
width, a depth, and a range. 

4. The computer imaging system according to claim 1, 
wherein the transmitting means further transmits the first 
three-dimensional geometric model. 

5. The computer imaging system according to claim 1, 
wherein a client receiving at least the second three- 
dimensional geometric model, the first two-dimensional 
image, the depth information, and the position of the first 
two-dimensional image with respect to the second three- 
dimensional geometric model transmits a quality of service 
message to the server via the network interface. 

6. The computer imaging system according to claim 5, 
where the quality of service message includes at least one of 
a stop, a request for a faster frame rate of the first two- 
dimensional image, a request for a faster frame rate of the 
depth information, an improved resolution of the first two- 
dimensional image, a request for a slower frame rate of the 
first two-dimensional image, a request for a slower frame 
rate of the depth information, a lower resolution of the first 
two-dimensional image, a bit rate for the first 3-dimensional 
geometric model, a delay message, and a delay message that 
controls a clock. 
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7. The computer imaging system as recited in claim 6 
further comprising means for merging geometry rendered 
locally on the client with the depth information received 
from the server based on the depth value for each pixel. 

8. The computer imaging system as recited in claim 7 
further comprising means for compressing and streaming the 
client-rendered scene geometry that allows reconstruction of 
the geometry by the client as the streamed geometry stream 
arrives. 

9. The computer imaging system as recited in claim 8 
further comprising means for compressing a color and depth 
image stream by one or more of the following: 

intraframe compression of the color and depth indepen- 
dently as individual frames; 

interframe compression of the color and depth as separate 
animations; and 

interframe compression of the color and depth joined 
together into a single animation of the color and depth 
frames side by side or top to bottom. 

10. The computer imaging system as recited in claim 9 
further implementing a dynamic compression mode and 
comprising: 

means for the server to determine whether client view 

parameters and scene contents are changing; 
means for the server to begin sending individual frames 

that have successively higher resolution in at least one 

of color or depth; 
means for the server to begin sending frames that, when 

merged, produce a progressively higher and higher 

resolution in at least one of color or depth; and 
means for the server to detect changes in client view 

parameters or scene contents and begin streaming low 

resolution color and depth frames. 

11. The computer imaging system as recited in claim 10 
further comprising: 

means for providing user interaction commands with each 

of said at least one scene model; 
means for communicating the user interaction commands 

to the server; 

means for enabling the server to communicate to the 
client a depth range of each frame to allow merging the 
client-rendered scene geometry into the server- 
rendered frames; and 

means for the server to communicate to the client the view 
parameters of each frame. 

12. The computer imaging system as recited in claim 11 
wherein the view parameters include at least one of view 
point, view orientation, view frustum, and use of perspec- 
tive. 

13. The computer imaging system as recited in claim 12 
further comprising means for synchronizing client and 
server content and accommodating latency due to at least 
one of network delays, compression time, and decompres- 
sion time. 

14. The computer imaging system as recited in claim 13, 
wherein said synchronizing means comprises: 

independently running client and server clocks; 

means for initially synchronizing said client and server 

clocks to accommodate latency on the server, network, 

and client; 

means for the server to communicate to the client a 
timestamp for each frame that aids in synchronizing 
frames that arrive on time, and rejecting or delaying 
frames that do not arrive on time; and 

means for providing feedback from the client to the server 
regarding the measured error in the arrival time of the 
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frames and their timestamp to dynamically adapt to 
latencies in the system and their changes, 

15. The computer imaging system as recited in claim 13 
further comprising: 

5 a user interaction mode that allows predictive rendering 
by the server; and 
means for the server to compensate for client-server 
latency by using a deduced time lag and said user 
interaction mode to pre-render images so they arrive at 
the client on time. 

16. The computer imaging system as recited in claim 15, 
wherein said user interaction mode enables a user to interact 
with respect to at least one of rotation about an axis, motion 
along a path through space, panning, and zooming, 

17. A computer implemented method for interactively 
using three dimensional models across a network, compris- 
ing the steps of: 

storing at least one scene model stored in at least one 
memory of a computer, wherein each said at least one 
scene model has at least one first part and at least one 
second part, and each said at least one first part has a 
first three-dimensional geometric model and each said 
at least one second part has a second three-dimensional 
geometric model; 

converting the first three-dimensional geometric model 
into a first two-dimensional image with depth informa- 
tion; 

providing the position of the first two-dimensional image 
3Q with respect to the second three-dimensional geometric 
model; and 

transmitting the second three-dimensional geometric 
model, the first two-dimensional image, the depth 
information, and the position of the first two- 
35 dimensional image with respect to the second three- 
dimensional geometric model through the network 
interfaces to the network. 

18. The computer implemented system according to claim 
17, wherein said converting means resides on a server, 

40 19. The computer implemented method as recited in claim 
17, wherein the step of providing the position of the first 
two-dimensional image with respect to the second three- 
dimensional geometric model provides at least one of the 
following position parameters: a viewpoint, an orientation, a 

45 width, a depth, and a range. 

20. The computer implemented method as recited in claim 
17, wherein the transmitting step further transmits the first 
three-dimensional geometric model. 

21. The computer implemented method as recited in claim 
50 17, further comprising the step of transmitting a quality of 

service message to the server via the network interface. 

22. The computer implemented method as recited in claim 

21, wherein the quality of service message includes at least 
one of: a stop, a request for a faster frame rate of the first 

55 two-dimensional image, a request for a faster frame rate of 
the depth information, an improved resolution of the first 
two-dimensional image, a request for a slower frame rate of 
the first two-dimensional image, a request for a slower frame 
rate of the depth information, a lower resolution of the first 

60 two-dimensional image, a bit rate for the first three- 
dimensional geometric model, a delay message, and a delay 
message that controls a clock. 

23. The computer implemented method as recited in claim 

22, further comprising the steps of: 

65 streaming the geometry of all, part, or none of each of said 
at least one scene model from a remote server machine 
to a local client machine; 
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Streaming two-dimensional animations of all or part of 
each of said at least one scene model from the server to 
the client in a form that includes a depth value for each 
pixel. 

24. The computer implemented method as recited in claim 5 

23 further comprising the step of merging geometry ren- 
dered locally on the client with the depth information 
received from the server based on the depth value for each 
pixel. 

25. The computer implemented method as recited in claim 

24 further comprising the step of compressing and streaming 10 
the client-rendered scene geometry for allowing reconstruc- 
tion of the geometry by the client as the streamed geometry 
stream arrives. 

26. The computer implemented method as recited in claim 
24 further comprising the step of compressing a color and ^5 
depth image stream. 

27. The computer implemented method as recited in claim 
26, wherein the color and image stream are compressed by 
one or more of the following techniques: 

intraframe compression of the color and depth indepen- 
dently as individual frames; 20 

interframe compression of the color and depth as separate 
animations; and 

interframe compression of the color and depth joined 
together into a single animation of the color and depth 
frames side by side or top to bottom, 25 

28. llie computer implemented method as recited in claim 
26 further comprising the steps of: 

determining whether client view parameters and scene 

contents are changing; 
prompting the server to begin sending individual frames 30 

that have successively higher resolution in at least one 

of color or depth; 
prompting the server to begin sending frames that, when 

merged, produce a progressively higher and higher 

resolution in at least one of color or depth; and 35 
detecting changes in client view parameters or scene 

contents and begin streaming low resolution color and 

depth frames. 

29. The computer implemented method as recited in claim 

28 further comprising the steps of: 

providing user interaction commands with each of said at 

least one scene model; 
communicating the user interaction commands to the 

server; 

enabling the server to communicate to the client a depth 45 
range of each frame to allow merging the client- 
rendered scene geometry into the server-rendered 
frames; and 

communicating to the client the view parameters of each 
frame. 

30. The computer implemented method as recited in claim 

29 wherein the view parameters include at least one of: view 
point, view orientation, view frustum, and use of perspec- 
tive. 

31. The computer implemented method as recited in claim 

30 further comprising the step of synchronizing client and 
server content and accommodating latency due to at least 
one of network delays, compression time, and decompres- 
sion time. 

32. 'ITie computer implemented method as recited in claim 
31, wherein said synchronizing means comprises: 60 

providing independently running client and server clocks; 

synchronizing said client and server clocks to accommo- 
date latency on the server, network, and client; 

communicating to the client a timestamp for each frame 
that aids in synchronizing frames that arrive on time, 65 
and rejecting or delaying frames that do not arrive on 
time; and 
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providing feedback from the chent to the server regarding 
the measured error in the arrival time of the frames and 
their timestamp to dynamically adapt to latencies in the 
system and their changes. 

33. The computer implemented method as recited in claim 
31 further comprising the step of: 

providing a user interaction mode that allows predictive 

rendering by the server; and 
compensating for client-server latency by using a deduced 

time lag. 

34. The computer implemented method as recited in claim 
33, wherein the step of providing the user interaction mode 
enables a user to interact with respect to at least one of: 
rotation about an axis, motion along a path through space, 
panning, and zooming. 

35. A computer program product comprising a computer 
usable medium having computer readable program code 
embodied in the medium for processing digital images, the 
computer program product having: 

first computer program code for storing at least one scene 
model in at least one memory of a computer, wherein 
each of the at least one scene model has at least one first 
part and at least one second part, wherein each of the 
at least one first part has a first three-dimensional 
geometric model and each of the at least one second 
part has a second three-dimensional geometric model; 

second computer program code for converting the first 
three-dimensional geometric model into a first two- 
dimensional image with depth information, wherein the 
depth information is used to determine whether the 
two-dimensional image is in front or behind the second 
three-dimensional geometric model; 

third computer program code for providing the position of 
the first two-dimensional image with respect to the 
second three-dimensional geometric model; and 

fourth computer program code for transmitting the second 
three-dimensional geometric model, the first two- 
dimensional image, the depth information, and the 
position of the first two-dimensional image with respect 
to the second three-dimensional geometric model 
through the network interfaces to the network. 

36. A computer program product according to claim 35, 
further comprising: ninth computer program code for merg- 
ing geometry rendered locally on the client with the depth 
information received from the server based on the depth 
value for each pixel. 

37. A computer program product according to claim 36, 
further comprising: 

tenth computer program code for compressing and 
streaming the client-rendered scene geometry that 
allows reconstruction of the geometry by the client as 
the streamed geometry stream arrives. 

38. A computer program product according to claim 37, 
further comprising: 

eleventh computer program code for compressing a color 
and depth image stream by one or more of the follow- 
ing: 

intraframe compression of the color and depth inde- 
pendently as individual frames; 

interframe compression of the color and depth as 
separate animations; and 

interframe compression of the color and depth joined 
together into a single animation of the color and 
depth frames side by side or top to bottom. 

* * * ♦ ♦ 
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